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I. INTRODUCTION 

Many mathematical methods originating from theoret- 
ical physics have found use in completely different con- 
texts, among them the variational approach to the ther- 
modynamics of complicated systems, lying at the basis 
of e.g. the mean field approximation to spin systems. 
This has been successfully used in heuristic methods in 
the context of combinatorial optimization, for problems 
that allow a simple formulation in terms of Ising or Potts 
spins. For other kinds of combinatorial optimization 
problems, in particular assignment problems, a similar 
approach is more difficult to achieve; the related difficul- 
ties is the focus of this paper. 

In an instance of a combinatorial optimization prob- 
lem, a cost function is defined in terms of a set of dis- 
crete variables, and the object is to find an optimal state 
- a particular state of the variables that minimizes the 
cost function; in other words the ground state, if the cost 
function is interpreted as a Hamiltonian. In cases where 
the variables are of a binary nature, such a problem thus 
amounts to finding the ground state of an Ising spin sys- 
tem (spin glass) with a given Hamiltonian. 

For a small problem instance, an exact method can 
be used to solve it exactly. In addition to more 
problem-specific methods, Branch-and-Bound [jjj pro- 
vides a generic class of exact methods, where an in- 
telligent (as opposed to exhaustive) tree-search of the 
phase-space is performed, disregarding parts that can be 
ruled out beforehand. Another interesting approach is 
Simulated Annealing M, where a standard Monte-Carlo 
method is used to simulate the immersion of the system 
in a heat bath, starting at a high temperature, which is 
slowly lowered (annealing) in the course of the simula- 
tion. In the limit of very slow annealing, this stochastic 
method is guaranteed to yield the ground state as T — > 

For a large system, however, finding the exact ground 
state can be a very time-consuming task. For a large class 
of problems (NP-hard), the expected time required scales 
worse than any polynomial in the system size, and the 
quest for the exact ground state must be given up. In- 
stead, one has to resort to more or less dedicated heuris- 
tics, to meet the more modest goal of finding states with 
as low a cost as possible. 

Problems that can be formulated in terms of Potts or 
Ising spins admit a versatile heuristic method, Determin- 
istic Annealing, based on the iterative solution of the 
equations associated with the mean field (MF) approx- 
imation of the system at hand, combined with a slow 
decrease in temperature. With the MF variables inter- 
preted as neuron activities, the resulting dynamics at 
each temperature is that of a generalized Hopfield (or 
connectionist) network 0]. Deterministic (MF) anneal- 
ing has been successfully applied to a range of problem 
types, see e.g. §-§]. 

The MF approximation is most conveniently derived 



from a variational approach, where the proper Boltzmann 
distribution based on the true Hamiltonian is approxi- 
mated by a factorized distribution, constrained to be the 
product of individual single-spin distributions, each of 
which can be parameterized by the corresponding single 
spin average. The optimal parameters of the approxi- 
mating distribution minimize an associated free energy. 

Thus, the minimization of the cost function in a dis- 
crete phase space is replaced by the minimization of an 
effective cost function in a continuous parameter space, 
which in suitable coordinates (the spin averages) inter- 
polates between the discrete states of the original phase 
space. This gives an advantage as compared to a local 
optimization method confined to the discrete space, due 
to the possibility of taking shortcuts. 

A somewhat different type of optimization problems is 
given by assignment problems, where an optimal match- 
ing (assignment) between two sets of objects is desired, as 
defined by a given cost function. While certain subclasses 
of assignment problems, like e.g. linear assignment where 
the cost is linear in the assignment matrix, can be solved 
exactly in polynomial time, the generic assignment prob- 
lem is a non-polynomial one. 

For nonlinear assignment problems, an obvious gen- 
eralization of the MF-based deterministic annealing ap- 
proach is lacking, mainly due to the absence of a simple 
and natural analog of the MF approximation. While a 
linear cost appears to be the most sensible choice for 
a variational Ansatz, it does not lead to the simplicity 
usually associated with the MF approximation. Never- 
theless, it is possible to exploit the linear Ansatz to define 
dedicated deterministic annealing schemes for non-linear 
assignment, and we will investigate the difficulties and 
peculiarities involved in connection with this. A major 
drawback with this approach, however, is that the time 
required is exponential in the problem size, and so its 
practical usefulness is limited. 

A popular alternative, to avoid the complexity of such 
an approach, is to tweak MF annealing as defined for 
Potts systems to make it apply to assignment prob- 
lems. We will discuss two common methods of this type, 
Potts-plus-Penalty Q and Soft Assign , point out their 
strong and weak points, and where appropriate suggest 
improvements to the existing state of the art. 

To illustrate the implementation on a specific problem 
type, and to gauge the effect of the suggested improve- 
ments, a suitable subset of the methods will be applied 
to a small testbed of simple applications. 

The article is structured as follows: In Sec. II, the ba- 
sic idea of variational methods in general is described. In 
Sec. Ill, MF Annealing for a Potts system is derived from 
a variational MF approximation, and briefly described. 
Sec. IV contains a general discussion of assignment prob- 
lems, and defines some notation. The polynomial prob- 
lem of Linear Assignment is briefly discussed there. In 
Sec. V, we discuss the definition of proper determinis- 
tic annealing methods dedicated to assignment problems. 
Sec. VI contains a discussion of existing tweaked Potts- 



f 



based MF approaches, and suggestions for improvements. 
In Sec. VII we compare some of the suggested methods 
on a few simple test problems. Finally, Sec. VIII contains 
our conclusions. 



II. MF IN GENERAL - VARIATIONAL 
APPROACH 

The MF approximation, as it is used in MF anneal- 
ing for binary and Potts systems, is most conveniently 
derived from a general variational principle. 

Given a complicated cost function H(s) of the variables 
of interest, s, the idea is to approximate its associated 
Boltzmann distribution oc exp(— H/T) (at a fixed artifi- 
cial temperature T) with one derived from a simpler cost 
function Hy(s,X) (e.g. a linear one), with a set of free 
parameters A (the coefficients in the linear case). The 
parameter values are then determined by minimization 
of the associated free energy Fy (A), 

F v (X) = (H)-TS=-TlogZ + (H-H v ), (1) 

where (•) stands for an expectation value in the approx- 
imating distribution, and Z denotes the corresponding 
partition function J2 s e~}q>(—Hy(s)/T). S is the associ- 
ated entropy, given by — ^2 s p(s) logp(s), with p(s) the 
probability of state s, p(s) = exp(—H v (s)/T)/Z. 

The variational free energy is bounded from be- 
low by the true free energy, F = — Tlog(Z ) = 
— Tlog ^2 ex P( — H(s)/T). The condition for an ex- 
ternum of Fv with respect to parameter variations <5A 
is 

SF V =(6H V (H-Hy)) c = 0, (2) 

where (ab) c stands for the connected expectation value 
(or cumulant) (ab) — (a) (b). Thus, for each parameter 
A Q , we must have 

Although (^) may permit multiple solutions, including 
saddle-points or local minima, it is commonly used in 
the search for an optimal set of parameter values. 

Note that since the expectation values involve the 
variational Boltzmann distribution cx exp(— Hy/T) and 
hence depend on the temperature T, so will the optimal 
parameter values. 

A particularly simple special case results when the vari- 
ational cost function depends linearly on the parame- 
ters, 

H v (s;\) = Y / *aE a (s). (4) 

a 

Then, eq. (|^) for an extremum takes the simple form 



Y,(E a E b ) c \ b = (E a H) c . (5) 

b 

This has the form of a matrix equation, (EE) X = 
(EH) C , and a straightforward strategy for finding a so- 
lution is given by iteratively updating the parameters 
according to 

X^(EE)- 1 (EH) C , (6) 

followed by the corresponding updates of the expectation 
values (EE) , (EH) , which depend on the parameters 
via the Boltzmann distribution. 



III. MF ANNEALING FOR POTTS SYSTEMS 

In order to understand the problems associated 
with defining a deterministic annealing approach to 
assignment-based problems, it is instructive to first re- 
view how simpler types of systems are treated. 

A simple g-state multiple-choice variable (Potts spin) 
is conveniently represented by a (/-dimensional vector s, 
with the allowed states represented by the q principal 
vectors (1, 0,0,.. .), (0, 1,0,.. .), etc., with the position 
of the single nonvanishing component indicating which 
state is "on". These vectors are linearly independent, 
and point to the corners of a regular g-simplex. 

As a result, any single-spin cost function can be written 
as a linear function in s, H (s) = C • s, where C a is the 
cost associated with state a. For a system of N Potts 
spins, it follows that an arbitrary cost function can be 
written in a multilinear form, 

H(s) = a + ^ biaSia + - ^ C ^.JbS la Sjb + ■ ■ ■ , (7) 

i,a i.a j=£i,b 

where Si a denotes the a-th component of the i-th spin. 
Solving the associated optimization problem corresponds 
to finding the ground state of the system, i.e. the com- 
bination of states of the V Potts spins that minimizes 
H(s). 

The MF approximation to such a system results from 
a variational approach, corresponding to an optimal ap- 
proximation of the non-linear cost H(s) in terms of a 
linear one, 

Hv(s) = j2 c i- s i=J2 c ™ s ™- ( 8 ) 

i ia 

The coefficients C; a constitute the parameters, and are 
to be chosen so as to minimize the variational free energy 
Fy. It is convenient to express Fy in terms of the spin 
averages in the variational distribution, the (mean field 
spins) Vj, with components in [0, 1] amounting to 

(n\- I \ cxp (-da/T) 

ViaiC) = {sia)v = EbO^(-Qb/Ty (9) 
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In the MF approximation, Vi a corresponds to the prob- 
ability for spin i to be in state a, consistently with the 
identity J2a v ia = 1- The MF spins thus interpolate be- 
tween the discrete states of the original spins; in terms 
of them the variational free energy evaluates to 



F v (v) = T Vi a log Vi a +H(v), 



(10) 



which can be minimized with respect to the normalized 
MF spins by adding a Lagrange parameter A; for the 
normalization of each MF spin v^. The condition for a 
extremum, equivalent to eq. (0) amounts to 



dH(v)/dv ia + T(l + logw m ) = Aj 



(11) 



which, together with the normalization that fixes the A^ 
values, gives the variational coefficients C up to an unim- 
portant constant in terms of v as 



C ia (v) 



dH{v) 

dv ia 



(12) 



Eqs. (g) and (|12|) define the MF equations. 

The MF approximation corresponds to neglecting the 
correlations between the different spins, since the linear 
variational Ansatz used is the most general factorizcd 
distribution Pv(s) — Y\iPi{si), where the different Potts 
spins obey independent distributions. 

MF annealing corresponds to solving the MF equations 
iteratively, starting with a high T, where a fixed point 
with Vi a ~ 1/N will dominate, and slowly lowering T. 
At low enough T, the MF spins will be forced on shell, 
i.e. for Vi a ~ Sk G 0, 1, and a suggested solution can be 
extracted. 



IV. ASSIGNMENT PROBLEMS - GENERAL 
DISCUSSION 

A. Notation 

When it comes to permutation/assignment problems, 
we have to distinguish between single assignment prob- 
lems and multiple assignment problems, the latter being 
based on several assignments. 

To begin with, we will consider the simpler case of a 
single assignment, where an optimal matching between 
two sets of TV objects is desired, i.e. for each object i in 
the first set, an object a in the second set is to be chosen, 
such that different i are assigned to different a. There are 
obviously AH ways to accomplish this. 

A compact way to describe an assignment is by means 
of the associated assignment matrix, i.e. an TV x TV dou- 
bly stochastic matrix s with elements in {0, 1}, such that 
Si a = 1 if i is assigned to a, and otherwise (for a some- 
what different encoding of assignment problems as used 
in deterministic annealing, see Obviously, we must 



have precisely one unit element in each row as well as in 
each column of s, consistently with 



~ i' ■ 



= i. 



(13) 



Then, a given single assignment problem can be described 
in terms of a cost function H(s), which is to be mini- 
mized. 

Note, however, that, in contrast to e.g. the Potts case, 
the most general cost function for single assignment (for 
TV > 2) is not linear in s (see Appendix); in the worst 
case a polynomial of degree TV — 1 is needed. 

Alternatively, the cost function can be viewed as an 
explicit function over the permutation group P/y: Each 
group element g is associated with an individual cost C g , 
defining an clement of an TV!-dimensional cost vector C . 
The relation to the formulation in terms of the assign- 
ment matrices s is C g = H(s g ), where s g is the particular 
assignment matrix representing g. 



B. Thermodynamics for a single nonlinear 
assignment 

The thermodynamics af a system consisting of a single 
TV x TV assignment with an arbitrary cost function is not 
difficult to express, when formulated in terms of a cost 
vector C over group space. The assignment then acts 
as a single TV!-state Potts spin, that can be described 
by an TV!-dimensional vector S with precisely one unit 
component, while the rest is zero: S g G {0, 1}, J2 g Sg — 
1. 

At an artificial temperature T, the probability of a 
particular state g amounts to 



Vg = (S g ) 



exp(-C g /T) 

£ h ex P (-cyry 



(14) 



As T — ► 0, the distribution gets increasingly concentrated 
at the state (group element) with the lowest cost. 

When viewed this way, the difficulty lies entirely in the 
huge number of states involved if TV is large. In order to 
compute one component of V , one has to know the costs 
for all the TV! states. For a generic cost function, the 
associated computational complexity is non-polynomial 
in the size TV of the system. 

Thus, for a generic large assignment problem, one has 
to make do with some kind of heuristic. 



C. Linear assignment 

Certain classes of assignment problems can be solved 
exactly in polynomial time. One such class is linear as- 
signment, where the cost function is constrained to be 
linear in the assignment matrix s, 
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H(s) = ^2c ij s i 



(15) 



defining an N x N cost matrix c. 

This problem corresponds essentially to a linear pro- 
gramming one, and can be solved in polynomial time, 
using e.g. the so called Hungarian algorithm {Ujjlj, 
based on the fact that the addition of terms to c de- 
pending on row or column alone, c y - — > cy + a, + 6j, 
is equivalent to adding a constant to the cost function, 
H(s) — > i? (s) + J^,. a, + ^ 6j . This is used to iteratively 
modify the cost matrix until it takes a form where it has 
zeros on a set of elements corresponding to the optimal 
assignment, and non-negative values elsewhere. 

Unfortunately, this doesn't help when computing ther- 
mal averages at a finite T; this is still a non-polynomial 
task. Thus, e.g., the expectation value of sy is given in 
terms of a matrix M, obtained from c by elementwise 
exponentiation, 



Mi. 



exp(-cy/T), 



P(M) ■ 



(16) 



(17) 



Here, P(M) is the permanent jl2] of M; it has some 
similarities to the determinant, being the sum of all the 
possible products of N elements in M, one in each row 
and one in each column, but with no minus signs in con- 
trast to the case for the determinant. Similarly, Py(M) 
is the subpermanent of M, obtained by removing row i 
and column j from M, and computing the permanent of 
the remaining (N — 1) x (N — 1) matrix. 

The expression for Vij in eq. (|17j) can be derived from 



[Sij) - Zdc i3 
where Z is the partition function, 



(18) 



P(M), ensuring that eq. (17) yields a doubly stochastic 
matrix v. 

Similarly, the expectation value of the product of two 
elements of s becomes 

M M Pjj{M) M l3 M kl P lkjl (M) 



P(M) 



P{M) 



where Pik,ji(M) is a subpermanent obtained as the per- 
manent of the submatrix where rows i, k and columns j, I 
are removed, if i ^ k and j ^ I; else it is zero. Thus 
MijMkiPik,jl(M) sums up the terms in P(M) that con- 
tain M i:j Mki. 

As an aside, replacing the permanents by determinants 
in the expression ( |l7j ) for v would lead to the combi- 
nation Dij(M)/ D(M), exactly corresponding to the j,i 
element of the matrix inverse of M. The elementwise 
product with M would yield a doubly (quasi-)stochastic 
v, where row and column sums are equal to one, albeit 
with elements of both signs. 

However, while the determinant of a matrix can be 
calculated in polynomial time, the permanent in general 
can not, in spite of their similarity - the computational 
time required to compute a generic N x N permanent 
is exponential in N (roughly cx 2 N using e.g. Ryser's 
method mH). 



V. PROPER VARIATIONAL METHOD FOR A 
SINGLE NONLINEAR ASSIGNMENT 

For a large generic single assignment problem, an ex- 
act solution is out of reach, and one has to make do with 
heuristic methods. One possibility then is to consider a 
deterministic annealing approach based on approximat- 
ing the true cost function by a variational one that is 
simpler. 



A. Linear Ansatz for H\ 



Z 



,(g)/T 



= ^exp l-^Tcij/T 

9 \ ijeg j 

= e n ° x p = e n m v = p ( M )' ( ig ) 



g ijeg 



where the restriction ij £ g in the sum over ij means that 
row i and column j are matched in s(g) (so Sij(g) = 1). 
The derivative of Z = P(M) with respect to My yields 
Pij(M), which completes the proof of eq. (p~7|) . 

The combination MijPij(M), appearing in eq. 
gives the sum of those terms in the permanent that con- 
tain the element My. Summing this over i or j yields 



(17) 



The most natural Ansatz for the variational cost func- 
tion Hy is a linear one, 



H v{s) = E c ^' Sl . 



(21) 



with the coefficients cy as free parameters. 

The equation (^) for a minimum of the variational free 
energy then yields: 



{ s ijSki) c cu = {sijH) c 



(22) 



In analogy with eq. (Jg), this is a matrix equation, from 
which c can be formally extracted as 



c= (ssV 1 (sH) 



(23) 
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Note that the N 2 x iV 2 matrix {ss) c is not fully invertible; 
it always has 2N — 1 zero-modes corresponding to the 
addition of redundant terms to c depending on row or 
column index alone. These merely yield row and column 
factors in the exponentiated matrix M, and are of no 
importance for expectation values. 

If H is a low-order polynomial in s, a solution to eq. 
( p2| ) can in principle be computed iteratively in analogy 
to the iterative solution of the Potts MF eqs. (^|, |l2|), by 
repeating the two steps 

1. Calculate the expectation values appearing in eq. 
(p2|); they depend on the present c via M and 
its permanent (and subpermanents) , where Mij = 
exp(-Pdj), in analogy to eqs. ( |l7| , po|) . 

2. Obtain an updated cost matrix c by means of 
eq. (p3|), suitably regularized with respect to the 
zero-modes of (ss) c . [j] 

This can be turned into an annealing approach, by start- 
ing with a high T (low and decreasing T slightly after 
every step, until the "MF variables" = (s^) have sta- 
bilized sufficiently close to zero or one. 

A major drawback of this approach is that it is only 
feasible if N is not too large, since the computation of 
expectation values involves the computation of perma- 
nents, which requires a time exponential in N. 



B. Quadratic H 

The simplest non-linear function of s is quadratic, so 
assume H to be a given quadratic plus linear function in 

s, 



H(s) = ^^AijkiSijSki + y^-B 

ijkl i 



(24) 



involving a symmetric tensor A, Aijki — AkUj , which can 
be assumed to vanish for i = k or j = I. 

The variational eq. ( p2|) corresponding to a linear 
Hv{s) becomes 



iklmn 



^ (SijSkl) c Ckl = ^ E ( Si 3 ( S klSmn)) c Akl; 
kl klmn 

+ (SijS k l) c I Aklmn (Smn) + B k l ) ■ (25) 

kl \ mn / 



Denoting the first term on the RHS of eq. (25) by Fij, 
the effective cost matrix c can be formally extracted as 
c = B + A (s) + (ss)~ F, using a suitably regularized 
matrix inverse. 



*In cases of instability, the change in c can be decreased by 
e.g. a factor of 1/2. 



C. Group theoretical aspects 

It is instructive to view an assignment problem from a 
group-theoretical point of view, where the relevant group 
of course is the permutation group of N elements, de- 
noted by Pn- 

Like any functions over group space, H and Hv can be 
expressed in a unique way as linear combinations of the 
matrix elements of the irreducible representation matri- 
ces of Pn (see Appendix for details). 

Requiring Hy to be linear in s means that its expan- 
sion is constrained to contain only elements from the 
trivial and the fundamental irreducible representations, 
e and f ; thus, it can be written in a unique way in the 
form 



H v (g)=A + J2B ij u{ j (9l 



(26) 



where / stands for the fundamental representation. This 
leads to the probability V g oc exp(—Hv(g)/T), such that 
V g — I, for a particular group element g. The corre- 
sponding version of the variational eq. (pf) becomes 



^E B «Ert) = Ew (27) 

kl g g 

A^VgU^g) +Y, B ^Y, V A(Ml(9) 
g kl g 

= J2 V 9 u U9)H(g), (28) 
g 

corresponding exactly to respectively the trivial and the 
nontrivial parts of eq. (p2[). The trivial part A can be 
eliminated from (|27| ) and inserted into (|2q), yielding 

E ( E vAbWuia) - E v A(a) E Vh4iW) B ™ 

kl \ g g h J 

= E Vg4(9)H(g) - E V B u f ijl3) E V hH{h), (29) 



which is nothing but a disguised version of eq. (|22|); the 
sums over g with V g as a weight correspond to averages. 

For the special case ( p4] ) of a quadratic H(s), the 
corresponding H(g) is constrained to include elements 
from e, f and two additional representations, a and s 
(see Appendix), in its irrep expansion, with dimensions 
d a = (N-l)(N-2)/2 (if N > 2), and d s = N(N -3)/2 
(if N > 3), respectively. 

Although the above analysis illuminates the linear vari- 
ational approach from a group-theoretical point of view, 
the resulting formulation is not of an immediate practical 
interest for large N, since it (at least formally) requires 
the complete enumeration of the AM-dimensional group 
space. 
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D. Proper variational methods for multiple 
nonlinear assignment 

Here we will briefly discuss the possibilities for treating 
systems of several distinct assignments in a variational 
approach. 

1. General additive Ansatz 

In the case of a generic cost function of several assign- 
ments, the most natural choice is to consider a generic ad- 
ditive Ansatz for a variational cost function, correspond- 
ing to a factorized Boltzmann distribution. In principle, 
this corresponds to the MF approximation to a system 
of AH-state Potts variables, and can be treated as such. 
This can be useful, even for large systems (many distinct 
assignments), as long as the individual assignments are 
of low dimensionalities. 



2. Linear Ansatz 

A further simplification results from requiring that the 
variational cost function not only be additive in the dif- 
ferent assignments, but also that the contribution from 
each assignment be linear. 

A possible strategy then is to update the cost matrix 
for one assignment at a time according to eq. (p3|), con- 
sidering the single-assignment averages associated with 
other assignments as constant, and recomputing the 
single-assignment averages associated with the updated 
cost matrix before moving on to update the next assign- 
ment. 



3. Multilinear cost, linear Ansatz 

A particularly simple special case of the above is when 
the exact cost function is a multilinear function of the 
assignments matrices s. For the case of two assignments, 
s*- 1 ) and s^ 2 \ this means a cost function H of the form 

ff(- (1) ,* w ) = EE^.«-S > -2 ) 

ij kl 

(30) 

Then an additive variational cost function automatically 
will be linear 

iW^HE^+E^Mf- (3D 

ij ij 

The resulting updates then amount simply to 



kl 

(32) 

cg , =E^.«(-S , >+ B «- 

ij 

where variational expectation values are understood, 
computable in terms of permanents and sub-permanents 
of the respective cost matrices. 

The generalization to several assignments is straight- 
forward. 



VI. POTTS-BASED MF HEURISTICS FOR 
NONLINEAR ASSIGNMENT 

Although a proper variational approach as described 
above appears to be the natural choice for constructing 
a deterministic annealing approach for assignment prob- 
lems, a major problem is the computational complexity 
involved in the required computation of permanents and 
subpermanents. Indeed, for certain problem classes an 
instance can be solved exactly in the time it takes to 
compute a single permanent. This implies that these 
methods have a rather limited applicability. 

Instead, alternative methods based on Potts spins have 
been used to construct faster deterministic annealing 
methods for non-linear assignment problems. 

A. Potts plus Penalty 

One such method is based on viewing the assignment 
matrix s as an array of A-state Potts spins, one for each 
row. Then the row condition, ^2 a Si a = 1, is automat- 
ically fulfilled. One then adds to H a penalty term for 
breaking of the column normalization, and treats the re- 
sult using Potts MF annealing, based on the modified 
cost function 

^w + fE^-E*-) > ( 33 ) 

with the penalty strength a suitable adjusted. (Of 
course, one might equally well consider the columns as 
Potts spins and add penalties for the rows.) 

This approach, in what follows referred to as PPP 
for Potts plus Penalty, has been successfully applied to 
a number of different problems J5|. The problem with 
a soft penalty is that it involves a delicate tuning of the 
coefficient a; too small, and improper assignments result, 
where two rows are mapped to the same column; too 
large, and the cost is too dominated by the penalty term, 
with a consequent deterioration in performance. 

In PPP, the MF spins are preferably updated in a se- 
rial manner, one row at a time. This leads to the most 
stable MF dynamics, provided H is put in multilinear 
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form. It is easy to see then that the Potts free energy 
( |io| ) becomes a Lyapunov function of the dynamics at a 
fixed temperature. This ensures that the MF dynamics is 
well-behaved in the high-T domain, with the trivial high- 
T fixed point losing stability in a controlled manner. It 
also guarantees that PPP turns into a form of local opti- 
mization in the low-T limit, ensuring the stability of an 
optimal assignment. 



B. Soft Assign 

An obviously ugly feature of the PPP approach is the 
asymmetry in the treatment of rows and columns of s. 
This can be cured in a slightly more advanced Potts- 
inspired method, the Soft Assign (or "Double Potts") ap- 
proach Jicj ], which can be derived as a somewhat ad hoc 
improvement to PPP as follows. 

In PPP, the resulting MF average is given by wy = 
aiMijbj, where Mjj is given by exp(—(dH/dvij)/T) and 
the column factor bj comes from the penalty term, bj = 
exp(a(l — Vij)/T). In contrast, the row factor is the 
usual automatic Potts normalization factor, ensuring the 
exact normalization of rows. 

The idea in Soft Assign is to skip the penalty, and freely 
choose positive row and column factors so as to force 
the exact normalization of both rows and columns. This 
leads to the following problem: Given a matrix M with 
non-negative elements, find vectors of row and column 
factors, a and b, such that the result, 



(34) 



is a doubly stochastic matrix. 

This in fact determines v uniquely, which can be seen 
by defining x$ = logflj, \jj — logbj (a,i,bj are assumed 
positive), and noting that the correct x,y minimize 
the strictly convex function f(x,y) = eXi Mije Vj — 

However, the proper row and column factors can not 
be obtained as simple, closed expressions in the matrix 
elements of M. Instead, the desired doubly stochastic 
matrix v is usually obtained by iteratively modifying M, 
alternatingly normalizing rows and columns until the re- 
sulting matrix is sufficiently close to being correctly nor- 
malized: 

i) Mij _ =^_, 



ii) M k 



Mi. 



iii) Go to i). 

This procedure, which is due to Sinkhorn [flijl , is guaran- 
teed to yield convergence to a unique doubly stochastic 
matrix v. 

For a nonlinear problem, we can obviously identify the 
derivatives dH/dvij with the elements of an estimated 



effective cost matrix, obtained by linearizing the cost- 
landscape in the vicinity of the present point. Note that, 
for a linear assignment problem, SoftAssign leads to ex- 
actly the same initial M as in eq. (|l7]) ; but that a different 
doubly stochastic matrix v is derived from it in eq. (M). 

As an aside, the SoftAssign approach can formally be 
associated with the minimization of an entity reminiscent 
of a free energy, 



F(v)=Tj2vijlogVij+H(v), 



(35) 



with v constrained to be doubly stochastic, e.g. by means 
of adding suitable Lagrange terms (with Lagrange mul- 
tipliers associated with the row and column factors) . Al- 
though F(v) is superficially highly analogous to the Potts 
free energy in eq. ([io|), SoftAssign does not correspond to 
a proper variational approch to approximating the true 
H(s), mainly because the first term is not —T times the 
proper entropy. Nor does v correspond to the expectation 
value of s in an approximating Boltzmann distribution. 

Note that for SoftAssign (unlike the case with a proper 
variational approach), the resulting dynamics is sensitive 
to the precise formulation of H (v ) as an extrapolation of 
H(s) to continuous arguments v. 

Nevertheless, SoftAssign seems theoretically more ap- 
pealing than PPP, in treating rows and columns in a sym- 
metric manner, and in guaranteeing a doubly stochastic 
v, and it has also been successfully applied to various 
types of problems ||, although the method does have 
some weak points as will be discussed below. 



1. Weak points with SoftAssign 

In the SoftAssign approach, the iterative Sinkhorn pro- 
cedure for normalizing v is problematic at low T. This 
can be seen by assuming one has reached a stage where 
the matrix is close to being doubly stochastic: 



M^ = (l + ai)vij(l + f3j), 



(36) 



where v is the desired doubly stochastic matrix, while 
(Xi,l3j are assumed small. To linear order in a, f3, one 
step of row normalization corresponds to a — > —v(3 in 
matrix notation, while one step of column normalization 
yields [3 — > — v T a. Together, this gives [3 — > v T vf3. 

Hence, the asymptotic rate of convergence is deter- 
mined by the eigenvalues of the positive-definite matrix 
U = v T v. At a high temperature, U will be close to a 
uniform matrix with l/N everywhere, while at a low T, 
it will be close to the identity matrix. Consider a simple 
Ansatz for U: Uij = (1 — a)5ij + a/N, with < a < 1. 
The limit a — > 1 corresponds to high T, while a — > em- 
ulates low T. U can also be written as U — (1 — a)l+aP, 
where 1 is the identity matrix, and P is the projection 
matrix onto the uniform vector. 
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U has two distinct eigenvalues: a single unit eigenvalue 
with a uniform eigenvector (1,1,..., 1), and an (N — 1)- 
fold degenerate value 1 — a with eigenvectors in the trans- 
verse space. The unit eigenvalue is to be expected, re- 
flecting the irrelevance of shuffling a global factor between 
a and b. 

What is worse is that as T — > (a — > 0), also the 
other eigenvalue, 1 — a, tends towards unity. This is not 
a special feature of this particular Ansatz for v, but a 
generic phenomenon: As v approaches a proper assign- 
ment matrix, v T v approaches the identity matrix. This 
means that the normalization procedure inevitably runs 
into convergence problems in the limit of low T. 

Another drawback as compared to PPP is that in Sof- 
tAssign, the elements of the assignment matrix v have to 
be updated in synchrony, and there is no obvious simple 
way to update it, say, one row at a time. As a result, sta- 
bility is not guaranteed at low T even for a good solution, 
unless the cost function H is carefully tuned. 



'ing up the iterative normalization 



2. 



In order to improve convergence for the normalization 
procedure in SoftAssign at low T, it appears important 
to initialize the multiplicative row and column vectors a 
and b carefully, so as to leave as little as possible for the 
slow iterative procedure to do. 

Obviously, to avoid overflow on the computer upon 
implementation of eq. ([l6|), the effective cost matrix will 
have to be modified with suitable row and column ad- 
ditions, Cij — > Cij + on + [3j, to ensure that the smallest 
elements be zero and the rest positive. This can be done, 
e.g., by first subtracting from the elements in each row 
the lowest element in that row, and then subtracting from 
the elements in each column the lowest element in that 
column. 

This measure does not suffice, however, to guarantee a 
proper starting point. Consider N = 3 and the cost ma- 

(100\ / 1 1 \ 

o i i J . At zero T this yields M — ( j o o J , 

and the normalization procedure will be caught in an 
eternal loop, alternating forever between the two states 

/ 1/2 1/2 \ / 1 1 \ 

io ° and V2 o o . In fact, this M is not nor- 

\ 1 J \ 1/2 0/ 

malizable with finite row and column factors! 

Obviously, it is not enough to ensure at least one zero 
in each row and each column of c. The zeros must be 
arranged such that at least one combination of them will 
correspond to a one-to-one matching between rows and 
columns. Finding such a modification with otherwise 
non-negative elements corresponds precisely to solving 
the associated (effective) linear assignment problem. 

Thus, we suggest using e.g. the Hungarian method to 
transform c to the proper form, in order to guarantee a 
normalizable M for T — ► 0. 

Even this step will not guarantee a fast convergence. 
As a second example, consider N = 2 and an effective 



cost matrix c = ( ° ° J , obviously in the proper form. 
The corresponding M at T = will be ( I {). If this 
is handed to the normalization procedure at a low T, 
the approach to the final doubly stochastic matrix, v = 
(J ° J , will be merely harmonic. 

The situation is considerably improved, by in addition 
carefully balancing the cost matrix c, to ensure also a 
maximal number of non-zero elements (while maintaining 
a sufficient set of zeros) , with the smallest of these being 
as large as possible. This can be done in polynomial 
time, using a recursive procedure. For the N = 2 case 
above, the balancing step will yield c — ( ° I), giving 
M = ( I ° J at zero T, and convergence is immediate. 

Still, there will be cases for N > 2 where the procedure 
will be caught in a slowly converging sequence - this is 
inevitable, due to the unit eigenvalues at zero T -, but 
in this way the most obvious traps are avoided. 

An additional improvement is possible when using the 
Hungarian algorithm for preprocessing the cost matrix, 
by exploiting that it gives a preferred matching. This 
can be used to modify the normalization process to im- 
prove the convergence rate by updating of matched row- 
column pairs simultaneously. Especially in the low T re- 
gion, where the matched elements unambiguously define 
a selected assignment, and the corresponding elements of 
v will be close to unity and the rest small, this noticeably 
speeds up the normalization. 

Thus, the normalization constraints for a coupled row- 
column pair i,j of a modified M read 



o> I ^Mik+Mijbj )=1, 



(37) 



h (X;M fci + M4y ] =1, 

and are simultaneously satisfied by cij = x/Ai and bj = 
x/Bj, where Ai = Y^k^j M ik, B j = Y^k^i M kj and 



y/AjBj (4Afjj + AjBj) - AjBj 
2Ma 



(38) 



Each matched row-column pair of M in turn is updated 
in this manner and the process is repeated until the result 
is close to being a doubly stochastic matrix. We will refer 
to this normalization scheme as coupled normalization. 



3. Ensuring a stable dynamics 

The second major problem with SoftAssign is the lack 
of a guarantee for the stability of a solution in the low- 
T limit; this problem never appears e.g. in a properly 
handled system of Potts MF spins with serial updating. 
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To remedy this, we will have to exploit the freedom of 
adding terms to H which are trivial on shell, but never- 
theless affect the MF dynamics. One would at least like 
to ensure that in the low-T limit, SoftAssign turns into 
some form of local optimization. 

One possibility then is to demand that H(v) be trans- 
formed into a concave function of v (in the subspace con- 
sistent with v being doubly stochastic). This guarantees 
that the energy in eq. ( |35| ) for T — > will have a lo- 
cal minimum in the corner corresponding to an optimal 
assignment. 

A crude way to ensure concavity is to add a nega- 
tive quadratic diagonal term — (a/2) ^ . sfj to H, with a 
large enough coefficient a. For a quadratic H in particu- 
lar, concavity also ensures that the SoftAssign free energy 
( pq ) becomes a Lyapunov function of the dynamics at a 
fixed temperature |15) . A disadvantage with this method 
is that also non-solutions will be stabilized. Empirically, 
a smaller diagonal addition often suffices to stabilize the 
dynamics. 

A more advanced possibility is to employ problem- 
specific modifications of H, adding suitable terms to en- 
sure the stability of a good solution. It is therefore of 
interest to analyze what kinds of generic additions are 
possible without altering the on-shell costs (or merely 
adding a constant). We will refer to such terms as being 
redundant. 

Linear redundant terms: Based on decomposing the 
defining representation of the permutation group P/v, 
given by s, into the direct sum of the trivial 1-dimensional 
irrep e and the fundamental (N — l)-dimensional one f 
(see Appendix), the only possible linear redundant addi- 
tions are given by combinations of row or column sums of 
s; in SoftAssign, such additions merely lead to a modifi- 
cation of the initial row or column factors a, b in eq. (34), 



SzjSu = 0, for j ^ I, 
SijSkj = 0, for i ^ k. 



(39) 



and have no effect on the resulting doubly stochastic ma- 
trix v. 

Combinations of quadratic and linear terms: The pos- 
sibilities here stem from the decomposition of the re- 
ducible representation of P/v defined by the symmetric 
direct product of the defining representation (<*=> e + f ) 
with itself, as discussed in the Appendix. The contribu- 
tions stemming from the trivial direct product of the e 
part with itself or with f give nothing useful, correspond- 
ing to quadratic terms involving row or column sums of s, 
completely equivalent to the corresponding linear terms 
with the row or column sums replaced by 1. 

The possibly interesting terms come from f x f = e+f+ 
s + a. As discussed in the Appendix, the symmetric part 
of this consists of a part a, antisymmetric in both row 
and column indices, yielding no redundant terms, and a 
part containing e + f + s, symmetric in both row and 
column indices, which yields quadratic terms that vanish 
on shell, or equal a constant, or a linear combination 
of the elements of s. In a slighly disguised form, they 
correspond to the on-shell identities 



= 0, 



Group theory certifies that these identities suffice to gen- 
erate all possibly useful redundant additions to H(s), 
that are at most quadratic in s. Although such additions 
to H are identically zero on shell (v — > s), as additions 
to H(v) they will alter the properties of the dynamics in 
SoftAssign, in terms of a modification of the expression 
for the effective cost matrix c = dH(v) / dv. 

Thus, the addition of a term proportional to the square 
of an element Vij minus the element itself, modifies only 
the corresponding element of c, by an addition propor- 
tional to 2vij — 1. Adding the product of two elements 
of s in the same row i but in different columns j, I af- 
fects the corresponding two elements of the effective cost 
matrix, with an addition to Cjj proportional to vu, and 
vice versa. Analogously, adding to H a term involving 
the product of two elements in the same column j but 
in different rows i, k yields the addition to Cjj of a term 
proportional to Vkj and vice versa. 



4- TSP-specific modifications 

In certain quadratic problems, such as the travelling 
salesman problem (TSP), where a set of N sites is to be 
cyclically visited in an optimal order, the cost function 
has the particular structure 



H(s) = ^Tr (sDs T X) , 



(40) 



with fl.Ia pair of symmetric N x N matrices, vanishing 
on the diagonal. 

For TSP, D is the pair-distance matrix, with D a {> defin- 
ing the distance between sites a, b, while X defines the 
cyclic tour sequence neighborhood, Xij — + <5j,j_i, 

such that H measures the total tour length. 

Concavity in the subspace consistent with s being dou- 
bly stochastic, of the direct product matrix A — D x X, 
then corresponds to one of D or X being positive- 
semidefinite, and the other negative-semidefinite, each 
in the subspace orthogonal to e = (1, 1, 1 . . . 1). This can 
be ensured by suitable diagonal additions to D and X 
separately, 



D^D + al, X -> X + jl, 



(41) 



with a and 7 of opposite signs. The diagonal additions to 
D and X implies adding terms to H of the form discussed 
in the previous subsection. All vanish on shell except for 
the a x 7 term, which evaluates to a simple constant. 

For the case of TSP, often D is already negative- 
definite in the transverse subspace, and a suitable ad- 
dition to X suffices. X is easily diagonalized by means 
of a discrete Fourier transform, with the spectrum given 
by A fc = 2cos(27rfc/iV), for k = 1, . . . , N- 1. Thus, 7 = 2 
is required to make the modified X positive-semidcfinite. 
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In practice, however, 7 = 1 will suffice to stabilize the 
dynamics in most cases; in the low-T limit this is just 
enough to secure the stability of assignments locally op- 
timal with respect to local changes in the ordering of 
visited sites. 



VII. TESTS ON SIMPLE APPLICATIONS 

In order to illustrate the ideas discussed in the previous 
section, we will here test the various improvements to the 
SoftAssign algorithm, as applied to a set of small single- 
assignment problems. 

The effects of the improvements to the normaliza- 
tion algorithm at low temperatures are illustrated using 
the linear assignment problem. For TSP, the use of a 
problem-specific stabilizing term is compared to employ- 
ing a negative quadratic term — (a/2) ^ , sfy 

The SoftAssign algorithm used is described in Fig. 
All experiments have been performed on an 800MHz Pen- 
tiumlll computer running Linux. 



• Initiate the elements of v to random values close 
to 1/N, and T to a high value. 

• Repeat the following (a sweep), until the v matrix 
has saturated (i.e. become close to a (0,l)-matrix): 

— Calculate the effective cost matrix by means 
of Cij = dH/dvij, possibly modify it 
with suitable row/column additions, and let 
Mij = exp{-c i:j /T). 

— Normalize M with the proper row and col- 
umn factors to yield a doubly stochastic ma- 
trix, defining an updated v. 

— Decrease T slightly (typically by a few per- 
cent). 

• Extract the resulting solution candidate. 



FIG. 1. A SoftAssign algorithm 



A. Speeding up the iterative normalization 



As discussed in section VI B 1 , the Sinkhorn normal- 



ization of M, eq. (35), runs into trouble when the tem- 
perature is low and the corresponding v is close to an 
on-shell assignment matrix. 



To probe the efficiency of each normalization scheme, 
we use random linear assignment, eq. (]l^), where the 
costs Cij €E [0, 1] are uniform random numbers. We inves- 
tigate the number of iteration steps needed before row 
and column sums of the modified M matrix are in the 
range 1 ± A max with A max a small number, and mea- 
sure the time used by the normalization scheme. This 
is done at a set of decreasing temperatures such that 
Vij pa l/N,\/i,j for the higher values of T, while v is 
nearly on shell for the lower T values. 

We c ompare the following schemes described in section 



VIB2 



1. Plain Sinkhorn. Preprocess c by first for each row 
subtracting the smallest element, and then doing 
the same for each column. Then the Sinkhorn row 



and column normalization (eq. (35)) is applied on 
the resulting M. 

2. Hungarian+ Sinkhorn. Preprocess c using the 
Hungarian algorithm. Then normalize M using 
Sinkhorn. 

3. Hungarian+Balancing+Sinkhorn. Preprocess c us- 
ing the Hungarian and the balancing algorithms. 
Sinkhorn normalization of M. 

4. Hungarian+CoupledNorm. Preprocess c as in 2, 
then apply the cou pled row -column normalization 
described in section VI B 2 . 



5. Hungarian+Balancing+CoupledNorm. Same pre- 
processing of c as in 3, then coupled row-column 
normalization. 

Figures || and || show statistics from 100 linear assign- 
ment problems of size N = 100. The data is binned for 
different values of the saturation, £ = {1/N) J2ij v ij> rep- 
resenting different temperature regions. At high temper- 
atures the saturation is close to 1/iV, while it approaches 
one in the low temperature region. The annealing is con- 
tinued until the saturation becomes larger than 0.999, 
but is also aborted when the number of normalization 
steps exceeds a maximal value of 20000 iteration steps for 
three consecutive temperatures (which only happened for 
the plain Sinkhorn approach as discussed below); these 
data points are not included when calculating the aver- 
ages in Figs. H and ||. 

The results illustrate the efficiency of the different nor- 
malization methods used on M, and the effect of prepro- 
cessing of the cost matrix c. 



§ For a more thorough description of SoftAssign in general, 
we refer to [|lo| 
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FIG. 2. Number of normalization iterations used versus 
saturation. The data consist of averages from runs on 
100 random linear assignment problems, binned into dif- 
ferent values of the saturation. The normalization proce- 
dure is continued until all row and column sums are within 
1 ± A max with Amax = 0.01. The plot shows Plain Sinkhorn 
(+), Hungarian+Sinkhorn (□), Hungarian+CoupledNorm 
(©), Hungarian+Balancing+Sinkhorn (x), and Hungar- 
ian+Balancing+CoupledNorm (*). 

As can be seen in Fig. || the number of iterations 
needed with plain Sinkhorn grows by several orders of 
magnitude in the low temperature region, where the sat- 
uration approaches 1. What is not revealed in the figure 
is that the plain Sinkhorn scheme failed to normalize M 
at low temperatures in all of the tested problem instances 
- because of the limited resolution on the computer, small 
elements in M become zero where the corresponding el- 
ements in v should be one. The Sinkhorn scheme then 
gets stuck in an eternal loop, failing to produce a dou- 
bly stochastic v. The failure occurred at high enough 
temperatures that the saturation was below our limit of 
0.999, and the algorithm had to be aborted as described 
above. 

This problem can be avoided by either interrupting the 
annealing at an earlier stage, and extracting a solution 
from the unsaturated v matrix, or by adding additional 
redundant terms to the Hamiltonian, at the cost of a 
lower performance. However, to guarantee that an ar- 
bitrary cost matrix yields a doubly stochastic v, prepro- 
cessing of the cost matrix is essential. 

Using a Hungarian preprocessor on the cost matrix en- 
sures that the initial M has element values equal to one 
on a permutation, and values between zero and one on the 
other elements. This guarantees that at least the selected 
permutation survives in the normalizing process, and a 
doubly stochastic matrix v will always result. Though 
this sounds appealing, our tests reveal that this does not 
substantially decrease the number of Sinkhorn iterations 
as compared to the plain Sinkhorn approach, as seen in 



Fig. |. 

To avoid the extreme increase of the number of nor- 
malization iterations in the low temperature region one 
can apply balancing of the cost matrix c, or use the cou- 
pled n ormalization approach, both described in section 
VI B 2 . Applying either one (or both) of the methods de- 
creases the number of normalization iterations. This is 
evident in Fig. ||, especially in the low temperature region 
where the saturation approaches one. 

Applying the Hungarian and balancing methods to the 
cost matrix c leads to an increased time used to pro- 
duce an initial M, which is revealed in Fig. ||. The total 
time for the algorithm is dominated by the time spent on 
Hungarian and balancing. This time is nevertheless far 
exceeded by the time required with the plain Sinkhorn 
approach in the low temperature region. 

The Hungarian method is known to scale in time with 
problem size as 0(N 3 ) [Q, and the balancing routine em- 
pirically appears to behave similarly. This is to be com- 
pared to the time it takes to calculate the effective cost 
matrix, which e.g. for TSP is also an 0(N 3 ) procedure, 
while for generic quadratic assignment it scales as 0(N 4 ). 
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FIG. 3. Time used for normalization, including preprocess- 
ing of the cost matrix, versus saturation. The data consists 
of averages from runs on 100 random linear assignment prob- 
lems, binned into different values of the saturation. The nor- 
malization procedure is continued until all row and column 
sums are within 1 ± A max with A ma x = 0.01. The plot 
shows Plain Sinkhorn (+), Hungarian+Sinkhorn (□), Hun- 
garian+CoupledNorm (©), Hungarian+Balancing+Sinkhorn 
(x), and Hungarian+Balancing+CoupledNorm (*). All times 
are measured in seconds. 



B. Stable dynamics in TSP 

One of the most studied combinatorial optimization 
problems is TSP. Deterministic annealing has been ap- 
plied to it using both PPP and SoftAssign and we 
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refer to these articles for a more thorough description of 
the implementation on TSP. Here we will use TSP as an 
example where the choice of stabilizing term needed by 
SoftAssign indeed influences the performance. 

The standard assignment- matrix Hamiltonian for TSP 
is given in equation eq. (|40|). In addition to this an ex- 
tra stabilizing term is needed. We have compared the 
addition of a generic stabilizer in the form of a diagonal 
quadratic term, 



2 / j la. 



(42) 



as proposed in the literature Q, with a problem- specific 
stabilizer as discussed in section VI B 4 (X — » X + 7I), 



iab 



Sib-Dab 



(43) 



Throughout our experiments we have used the values 
1.0 for both a and 7. The a value is slightly smaller than 
the value 1.4 used in A 7 value of 1.0 ensures the 
stability with respect to local chan ges of t he ordering of 
visited sites as discussed in section VI B 4 , but does not 
always suffice to produce a proper assignment. When 
this happens (in about 1% of the tested problems) the 
algorithm is restarted, initialized with a new v. Typi- 
cally, one restart is sufficient to find a proper assignment 
matrix. 

We studied random TSP problems where the sites were 
uniformly generated in the two-dimensional unit square. 
In Fig. [| the tour lengths from 500 problems of size 100 
are shown. 
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FIG. 4. Tour lengths for 500 random two-dimensional Eu- 
clidean TSP problems of size 100, using (A) the generic sta- 
bilizer @ with a = 1.0, (B) the problem-specific stabilizer 
(Eh with 7 = 1.0. 



The generic stabilizer (eq. (f42|)) works by enhancing 
already large spin elements. Due to this, v saturates 
faster (towards an assignment matrix) in the course of 
the annealing. Since this effect is not as pronounced with 
the problem-specific stabilizer (eq. j43|)), equal annealing 
parameters will not lead to equal time used by the al- 
gorithms. Instead, parameters are chosen such that the 
respective performances are not too far from optimal and 
the times used are comparable. We have used a slower 
annealing, T — > T/1.01, for the generic stabilizer, and 
also allowed it to use up to 5 sweeps per temperature if 
the maximal change in a spin components is larger than 
0.01. An even slower annealing would not lead to a con- 
siderably better performance, in spite of the increase in 
time. With the problem-specific stabilizer, we have used 
the annealing rate T — > T/1.05, with one sweep per tem- 
perature. 
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With the generic stabilizer, the average tour length was 
9.53 and the average time used 1.53 s. With the problem- 
specific one, corresponding values where 8.39 and 3.74 s 
(including possible restarts). Thus, performance- wise, 
the problem-specific stabilizer is superior, while the in- 
crease in time used can be attributed to the slower satu- 
ration - this might be avoided by adding a small generic 
stabilizer term. 



VIII. CONCLUSIONS 

We have investigated the possibilities of defining a de- 
terministic annealing approach to nonlinear assignment 
problems, in anology to existing algorithms for Ising and 
Potts systems. 

We have analyzed a proper variational approach, where 
the problem cost function is approximated by a varia- 
tional cost, linear in the assignment matrices. For a single 
assignment problem this allows for an iterative scheme to 
minimize the variational free energy at a given temper- 
ature. Combined with annealing, this can be used as a 
deterministic annealing algorithm. 

As an aside, the generalization to multiple assignment 
problems is straightforward. Assuming additive linear 
contributions to the variational cost from the different as- 
signments leads to a mean-field-like approximation with 
a factorized Boltzmann distribution, and the variational 
parameters for the individual assignments can be up- 
dated in a serial manner. 

A major problem with the proper variational approach, 
however, is that it requires the calculation of permanents, 
which needs exponential time (in problem size). This 
implies that considering this as a general heuristic for 
large nonlinear assignment problems is not feasible. 

Abandoning the quest for a proper variational method 
for nonlinear assignment, we have also studied Potts- 
based methods as a more promising alternative, although 
per se not tailored for assignment problems. The cur- 
rently most appealing method of this type is the Soft- 
Assign algorithm, and we have proposed some improve- 
ments to it. 

The Sinkhorn normalization procedure used in Soft As- 
sign runs into convergence problems at low temperatures. 
We present arguments why this is unavoidable, and pro- 
pose proper adjustments of the effective cost matrix to 
reduce the effect. The application of a Hungarian prepro- 
cessing to the effective cost matrix guarantees that the 
Sinkhorn procedure always produces a doubly stochastic 
matrix. An additional balancing of the cost matrix de- 
creases the number of iteration needed by the normaliza- 
tion. In addition we devise an alternative normalization 
procedure which is easily implemented when a Hungar- 
ian preprocessing is used. It is superior to the Sinkhorn 
procedure at low temperatures. We have experimentally 
confirmed these statements by implementation of Soft- 
Assign on random instances of linear assignment. With 



other problem ensembles, however, we have experienced 
varying effects of the improvements, in some cases they 
appear essential, while in others they are more or less 
superfluous. 

Another problem with the SoftAssign approach is the 
lack of guarantee for stability in the low temperature re- 
gion: Solutions may not be stable. This problem can be 
resolved by adding to the Hamiltonian a stabilizer - a re- 
dundant term that affects the dynamics without altering 
the on-shell cost. We have used arguments from group 
theory to determine the possible types of redundant addi- 
tions that are at most quadratic in the spin components. 
As an example we discuss how such redundant terms can 
be used for the travelling salesman problem, and propose 
a TSP-specific stabilizing term different from the generic 
one normally used with SoftAssign. In numerical exper- 
iments we show that this enhances the performance. 
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APPENDIX A: BASIC GROUP THEORY FOR P N 

Here we will briefly review elements of the basic group 
theory for the permutation group of N elements, G = 
P N . 



with u e = 1, while f is a non-trivial (N — l)-dimensional 
irrep, the fundamental representation. For the e part, 
e.g., the corresponding N x 1-dimensional projection ma- 
trix is given by P? h = 1/yN. 



2. Irrep Expansion 

Due to the identity ^ r d 2 r = V, there are as many dis- 
tinct matrix elements in the inequivalent irreps as there 
are elements in the group. As is well known, these ele- 
ments form a complete orthogonal basis in group space, 
as expressed by 

. , V 

u ij\9)Kl(9) = -^&r,A,k5j,l (orthogonality), (A5) 



gee 



1. Representations and irreps 

G has a finite number of inequivalent irreducible rep- 
resentations, or irreps; the squares of their dimensions 
sum up to the size V of the group, V = N\, If r labels 
an irrep, let d r be its dimension, and let the associated 
d r x d r matrices be denoted u r (g). We have J2 r d^ = V. 

A D-dimensional reducible representation {U(g)} can 
be decomposed into the direct sum of (not necessarily 
distinct) irreps {r M } as 



(Al) 



or, in matrix form, U(g) = ^2 P fJ, u rf ' (g)P ,lT , where P M 
is a (^-independent) D x d r matrix that projects out 
the part of a vector that belongs to the associated irrep 
r = r^. It can be seen as a submatrix of an orthogonal 
D x D matrix V, used to similarity transform U to an 
explicitly blocked form Ub, U(g) — VUFi{g)V T . 

The ortogonality of V implies the following properties 
for the matrices P M : 



(A2) 
(A3) 



where 1^ denotes the d x d ide ntity matrix. Inverting the 
similarity transform, eq. (Al) is equivalent to 



(A4) 



^^""(s), expressing 



or, in matrix form, P M U (g)P 
the similarity transform of U to blocked form, with /z, v 
labeling respectively the row and column block. 

The assignment matrices s define a particular A^- 
dimensional representation (the defining representation) 
of P^v. It is reducible if N > 1, being the direct sum e + f 
of two irreps, where e is the trivial one-dimensional irrep 



d r Uij(g)utj(h) = VS gi h (completeness). (A6) 

r i,j=l 

Thus, any function F over the group can be expressed in a 
unique way as a linear combination of the irrep elements, 



(A7) 



where C are coefficients, and u r (g) is the orthogonal ma- 
trix representing the gro up e lement g in the irrep r. Due 
to the completeness, eq. (A7) can be inverted to yield the 
coefficients uniquely as 



Linear expressions in s 



(A8) 



Eq. (A4) can be interpreted as follows: Independently 
of the group element g, certain linear combinations of the 
elements of the matrix U(g) representing g in a reducible 
representation R, will be identical to the elements of the 
orthogonal matrix u r (g) corresponding to an irrep r that 
appears in R; certain other linear combinations (corre- 
sponding to fi v ) will vanish. Together, these span a 
complete basis in the space of all possible linear combina- 
tions. This can be used to identify the redundant linear 
or quadratic expressions in the assignment matrix s. 

A linear function of the assignment matrix s can have 
non-vanishing coefficients only for r — e and r = f in 
its expansion ( [A7| ) . p*| Separating the e and f parts of s 
yields 



** Which shows that the most general cost function is not 
linear in s. 
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s l3 ; = l/N + Y,PiAiPfi- (A9) 



kl 



The different versions of eq. ( A4) then yield a set of iden- 
tities, 





= N, 


(A10) 


ij 


= 0, 


(All) 


ij 


= o, 


(A12) 




= u li(g)- 


(A13) 



The first three of these express in a slightly disguised 
form the constraints of unit row and column sums in s. 



b. Quadratic expressions in s. 

A quadratic expression in s means a linear combination 
of products Uikji = SijSki, defining the direct product 
representation (e + f ) x (e + f), considering the pair of 
row indices {i, k} as a composite row index, and the pair 
of column indices {j, 1} likewise as a composite column 
index. 

It is reducible, and the corresponding version of 
eq. (|A4|) reads 

E Q*,m«<j(9)»MQ i!,n = M^Cff). (AW) 
ik.jl 

The non-trivial part comes from f x f, which reduces 
to e + f + s + a (for N > 2), where s and a are two 
new irreps, of dimensionalities d s = N(N — 3)/2 and 
da=(JV-l)(JV-2)/2. 

Due to the obvious symmetry of [/, Uik.ji = Uki t ij, 
it is natural to divide the resulting irreps in two sets, 
according to the symmetry with respect to swapping the 
index pair ik in Q^ k . Thus, the symmetric part of f x f 
contains e + f + s, while the antisymmetric part contains 
a alone. 

With opposite symm etry type in the row and column 



parts, the LHS of eq. ( A14 ) vanishes identically. Thus, 



the interesting parts require /x, v to correspond to the 
same type of symmetry. The antisymmetric part con- 
tains only the nontrivial part [i = v = a, yielding it a . 
In the symmetric part, the fi = v = s part similarly 
yields u s , while the remaining combinations yield prod- 
ucts of elements of s for which the RHS will either vanish 
(p =/= v), or yield a constant (/j, = v = e), or a linear com- 
bination of the elements of s (fi — v = f); the LHS then 
defines candidates for redundant quadratic additions to 
a cost function. 
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